50 research outputs found

    Variable Selection, Sparse Meta-Analysis and Genetic Risk Prediction for Genome-Wide Association Studies

    Get PDF
    Genome-wide association studies (GWAS) usually involve more than half a million single nucleotide polymorphisms (SNPs). The common practice of analyzing one SNP at a time does not fully realize the potential of GWAS to identify multiple causal variants and to predict risk of disease. Recently developed variable selection methods allow the joint analysis for GWAS data, but they tend to miss causal SNPs that are marginally uncorrelated with disease and have high false discovery rates (FDRs). Genetic risk prediction becomes highly challenging when the number of causal variants is large and many of the effects are weak. Existing methods mostly rely on marginal regression estimates, and their prediction power is quite limited. In meta-analysis, the involvement of multiple studies adds one more layer of complexity to variable selection. While existing variable selection methods can be potentially applied to meta-analysis, they require direct access to raw data, which are often difficult to be obtained. In the first part of this dissertation, we introduce GWASelect, a statistically powerful and computationally efficient variable selection method for analyzing GWAS data. This method searches iteratively over the potential SNPs conditional on previously selected SNPs and is thus capable of capturing causal SNPs that are marginally correlated with disease as well as those that are marginally uncorrelated with disease. A special resampling mechanism is built into the method to reduce false-positive findings. Simulation studies demonstrate that the GWASelect performs well under a wide spectrum of linkage disequilibrium patterns and can be substantially more powerful than existing methods in capturing causal variants while having a lower FDR. In addition, the regression models based on the GWASelect tend to yield more accurate prediction of disease risk than existing methods. In the second part, we propose a new approach, Sparse Meta-Analysis (SMA), which performs variable selection for meta-analysis based solely on summary statistics and allows the effect sizes of each covariate to vary among studies. We show that the SMA enjoys the oracle property if the estimated covariance matrix of the parameter estimators from each study is available. We also consider the situations in which the summary statistics include only the variances or no variance/covariance information at all. Simulation studies and real data analysis demonstrate that the proposed methods perform well. Since summary statistics are far more accessible than raw data, our methods have broader applications in high-dimensional meta-analysis than existing ones. In the third part, we investigate the issue of genetic risk prediction when the number of true causal SNPs is large and many of the effect sizes are small. We show that the estimators obtained from marginal logistic regression can be severely biased and that using these estimators for prediction can lead to highly inaccurate results. To construct a joint-effects model, we propose a new method based on the smoothly clipped absolute deviation-supporting vector machine (SCAD-SVM). We conduct a series of simulation studies to show that our method outperforms the methods based on marginal estimators. We further assess the performance of our method by applying it to real GWAS studies.Doctor of Philosoph

    Sparse meta-analysis with high-dimensional data

    Get PDF
    Meta-analysis plays an important role in summarizing and synthesizing scientific evidence derived from multiple studies. With high-dimensional data, the incorporation of variable selection into meta-analysis improves model interpretation and prediction. Existing variable selection methods require direct access to raw data, which may not be available in practical situations. We propose a new approach, sparse meta-analysis (SMA), in which variable selection for meta-analysis is based solely on summary statistics and the effect sizes of each covariate are allowed to vary among studies. We show that the SMA enjoys the oracle property if the estimated covariance matrix of the parameter estimators from each study is available. We also show that our approach achieves selection consistency and estimation consistency even when summary statistics include only the variance estimators or no variance/covariance information at all. Simulation studies and applications to high-throughput genomics studies demonstrate the usefulness of our approach

    Deflection in higher dimensional spacetime and asymptotically non-flat spacetimes

    Full text link
    Using a perturbative technique, in this work we study the deflection of null and timelike signals in the extended Einstein-Maxwell spacetime, the Born-Infeld gravity and the charged Ellis-Bronnikov (CEB) spacetime in the weak field limit. The deflection angles are found to take a (quasi-)series form of the impact parameter, and automatically takes into account the finite distance effect of the source and observer. The method is also applied to find the deflections in CEB spacetime with arbitrary dimension. It's shown that to the leading non-trivial order, the deflection in some nn-dimensional spacetimes is of the order O(M/b)n3\mathcal{O}(M/b)^{n-3}. We then extended the method to spacetimes that are asymptotically non-flat and studied the deflection in a nonlinear electrodynamical scalar theory. The deflection angle in such asymptotically non-flat spacetimes at the trivial order is found to be not π\pi anymore. In all these cases, the perturbative deflection angles are shown to agree with numerical results extremely well. The effects of some nontrivial spacetime parameters as well as the signal velocity on the deflection angles are analyzed.Comment: 30 pages, 7 figures; title modified; to match published version in Class.Quant.Gra

    Fasting insulin concentrations and incidence of hypertension, stroke, and coronary heart disease: a meta-analysis of prospective cohort studies

    Get PDF
    Background: Insulin resistance is a precursor of numerous chronic diseases, including cardiovascular disease (CVD). The fasting insulin concentration is considered a reasonable surrogate of insulin resistance, especially among nondiabetic individuals

    A Web-Server of Cell Type Discrimination System

    Get PDF
    Discriminating cell types is a daily request for stem cell biologists. However, there is not a user-friendly system available to date for public users to discriminate the common cell types, embryonic stem cells (ESCs), induced pluripotent stem cells (iPSCs), and somatic cells (SCs). Here, we develop WCTDS, a web-server of cell type discrimination system, to discriminate the three cell types and their subtypes like fetal versus adult SCs. WCTDS is developed as a top layer application of our recent publication regarding cell type discriminations, which employs DNA-methylation as biomarkers and machine learning models to discriminate cell types. Implemented by Django, Python, R, and Linux shell programming, run under Linux-Apache web server, and communicated through MySQL, WCTDS provides a friendly framework to efficiently receive the user input and to run mathematical models for analyzing data and then to present results to users. This framework is flexible and easy to be expended for other applications. Therefore, WCTDS works as a user-friendly framework to discriminate cell types and subtypes and it can also be expended to detect other cell types like cancer cells

    A Web-Server of Cell Type Discrimination System

    Get PDF
    Discriminating cell types is a daily request for stem cell biologists. However, there is not a user-friendly system available to date for public users to discriminate the common cell types, embryonic stem cells (ESCs), induced pluripotent stem cells (iPSCs), and somatic cells (SCs). Here, we develop WCTDS, a web-server of cell type discrimination system, to discriminate the three cell types and their subtypes like fetal versus adult SCs. WCTDS is developed as a top layer application of our recent publication regarding cell type discriminations, which employs DNA-methylation as biomarkers and machine learning models to discriminate cell types. Implemented by Django, Python, R, and Linux shell programming, run under Linux-Apache web server, and communicated through MySQL, WCTDS provides a friendly framework to efficiently receive the user input and to run mathematical models for analyzing data and then to present results to users. This framework is flexible and easy to be expended for other applications. Therefore, WCTDS works as a user-friendly framework to discriminate cell types and subtypes and it can also be expended to detect other cell types like cancer cells

    A General Framework for Association Tests With Multivariate Traits in Large-Scale Genomics Studies

    Get PDF
    Genetic association studies often collect data on multiple traits that are correlated. Discovery of genetic variants influencing multiple traits can lead to better understanding of the etiology of complex human diseases. Conventional univariate association tests may miss variants that have weak or moderate effects on individual traits. We propose several multivariate test statistics to complement univariate tests. Our framework covers both studies of unrelated individuals and family studies and allows any type/mixture of traits. We relate the marginal distributions of multivariate traits to genetic variants and covariates through generalized linear models without modeling the dependence among the traits or family members. We construct score-type statistics, which are computationally fast and numerically stable even in the presence of covariates and which can be combined efficiently across studies with different designs and arbitrary patterns of missing data. We compare the power of the test statistics both theoretically and empirically. We provide a strategy to determine genome-wide significance that properly accounts for the linkage disequilibrium (LD) of genetic variants. The application of the new methods to the meta-analysis of five major cardiovascular cohort studies identifies a new locus (HSCB) that is pleiotropic for the four traits analyzed

    TCR-L: An Analysis Tool for Evaluating the Association Between the T-Cell Receptor Repertoire and Clinical Phenotypes

    Get PDF
    Background: T cell receptors (TCRs) play critical roles in adaptive immune responses, and recent advances in genome technology have made it possible to examine the T cell receptor (TCR) repertoire at the individual sequence level. The analysis of the TCR repertoire with respect to clinical phenotypes can yield novel insights into the etiology and progression of immune-mediated diseases. However, methods for association analysis of the TCR repertoire have not been well developed. Methods: We introduce an analysis tool, TCR-L, for evaluating the association between the TCR repertoire and disease outcomes. Our approach is developed under a mixed effect modeling, where the fixed effect represents features that can be explicitly extracted from TCR sequences while the random effect represents features that are hidden in TCR sequences and are difficult to be extracted. Statistical tests are developed to examine the two types of effects independently, and then the p values are combined. Results: Simulation studies demonstrate that (1) the proposed approach can control the type I error well; and (2) the power of the proposed approach is greater than approaches that consider fixed effect only or random effect only. The analysis of real data from a skin cutaneous melanoma study identifies an association between the TCR repertoire and the short/long-term survival of patients. Conclusion: The TCR-L can accommodate features that can be extracted as well as features that are hidden in TCR sequences. TCR-L provides a powerful approach for identifying association between TCR repertoire and disease outcomes
    corecore